Complementing WordNet with Roget's and Corpus-based Thesauri for Information Retrieval
نویسندگان
چکیده
This paper proposes a method to overcome the drawbacks of WordNet when applied to information retrieval by complementing it with Roget 's thesaurus and corpus-derived thesauri. Words and relations which are not included in WordNet can be found in the corpus-derived thesauri. Effects of polysemy can be minimized with weighting method considering all query terms and all of the thesauri. Experimental results show that our method enhances information retrieval performance significantly. Department of Computer Science Tokyo Institute of Technology 2-12-1 Oookayama Meguro-Ku Tokyo 152-8522 Japan {rila,take,tanaka}@cs.titech.ac.jp expansion (Voorhees, 1994; Smeaton and Berrut, 1995), computing lexical cohesion (Stairmand, 1997), word sense disambiguation (Voorhees, 1993), and so on, but the results have not been very successful. Previously, we conducted query expansion experiments using WordNet (Mandala et al., to appear 1999) and found limitations, which can be summarized as follows :
منابع مشابه
Evaluation of Automatic Updates of Roget's Thesaurus
abstract Keywords: lexical resources, Roget's Thesaurus, WordNet, semantic relatedness, synonym selection, pseudo-word-sense disambiguation, analogy Thesauri and similarly organised resources attract increasing interest of Natural Language Processing researchers. Thesauri age fast, so there is a constant need to update their vocabulary. Since a manual update cycle takes considerable time, autom...
متن کاملTowards a Conceptual Representation of Lexical Meaning in WordNet
Knowledge acquisition is an essential and intractable task for almost every natural language processing study. To date, corpus approach is a primary means for acquiring lexical-level semantic knowledge, but it has the problem of knowledge insufficiency when training on various kinds of corpora. This paper is concerned with the issue of how to acquire and represent conceptual knowledge explicitl...
متن کاملCombining General Hand-Made and Automatically Constructed Thesauri for Query Expansion in Information Retrieval
One of the most intuitive ideas for enhancing the effectiveness of an information retrieval system is to include the use of a thesaurus. WordNet, as a hand-crafted and general-purpose thesaurus, intuitively should also work fine in information retrieval, but unfortunately, experimental results by many researchers have not been promising. Thereby in this paper we investigate why the use of WordN...
متن کاملEvaluating Roget's Thesauri
Roget’s Thesaurus has gone through many revisions since it was first published 150 years ago. But how do these revisions affect Roget’s usefulness for NLP? We examine the differences in content between the 1911 and 1987 versions of Roget’s, and we test both versions with each other and WordNet on problems such as synonym identification and word relatedness. We also present a novel method for me...
متن کاملMapping Roget's Thesaurus and WordNet to French
Roget’s Thesaurus and WordNet are very widely used lexical reference works. We describe an automatic mapping procedure that effectively produces French translations of the terms in these two resources. Our approach to the challenging task of disambiguation is based on structural statistics as well as measures of semantic relatedness that are utilized to learn a classification model for associat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999